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METHOD AND SYSTEM FOR DIAGNOSTICS 
OF APPARATUS 

GOVERNMENT RIGHTS 

5 

This invention was made with Government support under 
contract number NCC2-1426 awarded by NASA. The Gov- 
ernment has certain rights in this invention. 

FIELD OF THE INVENTION 10 

The present invention is related generally to diagnostic and 
monitoring systems wherein there is a combination of com- 
puting device, which implements a computational method for 
diagnostics and a device or apparatus diagnosed and moni- 
tored thereby. “Apparatus” here may include a machine, an 
industrial plant, a vehicle, manufacturing process, facility, 
utility system, or other engineered system. “Diagnostics” 
here is defined as determining whether an apparatus is oper- 2 o 
ating normally and, if not, determining more detained infor- 
mation about the fault or failure experienced by the apparatus 
from the available data. 

BACKGROUND OF THE INVENTION 25 

Modem apparatuses such as aerospace or ground vehicles, 
propulsion systems, electrical power systems, industrial 
equipment and machines, etc are outfitted with sensors and 
digital processors for control and monitoring of the appara- 30 
tuses. Diagnostics functions are control and monitoring func- 
tions that have to do with an abnormal, faulty, operation of the 
apparatus. The diagnostics functions could be used for 
improving safety of the system operation, e.g., by halting the 
operation or by switching to a degraded mode of operation; 35 
for improving system performance, e.g., by adapting the sys- 
tem control or scheduling; and for facilitating maintenance 
and repair, e.g., by providing a guidance on which of a plu- 
rality of possible maintenance actions should be undertaken. 

The discussion below is related to computational method 40 
for diagnostic estimation. “Diagnostic estimation” here is 
defined as a diagnostic function determining which of plural- 
ity of fault conditions exist in the apparatus, e.g., which part 
of the apparatus is faulty, and further determining fault states. 
The “fault states” here are defined as quantitative character- 45 
istics of the fault conditions, e.g., an extent or degree of 
damage of the part. Parametric fault states are the fault states 
described by real numbers and discrete fault states are the 
fault states described by binaries, e.g., 0/1 or true/false, or by 
integer numbers. Diagnostic estimation computes statistical 50 
estimates of the fault states of the apparatus from the available 
apparatus data. 

Most of embedded digital electronics used in modern sys- 
tems includes low-level diagnostics functions. Such prior art 
diagnostic functions are usually univariate, i.e., each function 55 
is performed by observing a single signal. The univariate 
approach is usually adequate for detecting and diagnosing 
faults localized at a single embedded sensor or single control 
loop in the system. However, there is also an important need 
for detecting and diagnosing faults, which are not localized 60 
and that simultaneously impact readings of many sensors 
and/or outputs of many low-level diagnostic functions. Some 
of the prior art diagnostic systems integrate and process 
simultaneous fault diagnostics codes by using AI (artificial 
intelligence) reasoning methods. The AI methods are based 65 
on suboptimal heuristics because they are dealing with hard 
problems that have combinatorial complexity. 


2 

Some of the background technology includes multivari- 
able model -based diagnostic estimation, which integrates and 
fuses the data from multiple sensors and multiple low-level 
diagnostics codes. Multivariable model-based methods are 
known in advanced control systems area. Though the multi - 
variable advanced control methods pursue a different prob- 
lem (control rather than the diagnostic estimation) they are 
related to the proposed method in the computational approach 
part. Much of prior multivariable control work is based on 
linear models and linear analysis methods. The linear models 
and methods cannot adequately address the multivariable 
diagnostic estimation problems because they cannot deal 
with nonlinearities, constraints, and system structure 
changes. A more relevant prior art is Model Predictive Con- 
trol (MPC), which is a control method overwhelmingly used 
in process industries. MPC computes control at each time step 
by solving a batch optimization problem over a moving pre- 
diction horizon. The important advantages of MPC are that (i) 
it can handle constraints on the control or system variables 
and (ii) it can handle system structure changes such as miss- 
ing sensor data and off-control actuators. 

Conventional technology related to the subject of this 
invention includes Moving Horizon Estimation (MHE) algo- 
rithms. MHE is based on ideas related to the MPC but is 
aimed at estimation rather than at control problems. MHE 
computes estimates of hidden state parameters by solving a 
batch optimization problem over a moving horizon of past 
observation data; MHE optimizes model fit to the observation 
data. The MHE or a related optimization based approach can 
be used for multivariable diagnostics estimation, but there are 
two difficulties that need to be overcome. One difficulty is that 
the fault estimation problems are nonlinear. Another diffi- 
culty is that these problems include discrete variables that 
describe the presence or absence of faults. Incorrect estima- 
tion of these discrete variables could lead to false positives 
and false negatives in the fault detection; both types of errors 
are undesirable. Presence of the discrete variables could lead 
to combinatorial complexity of the problem. 

MHE or other optimization-based methods can be imple- 
mented for diagnostic estimation by computing an optimal 
estimate of the fault parameters using standard algorithms for 
solving mixed problems with parametric and discrete esti- 
mated variables. Such algorithms used in the prior art include 
GA (Genetic Algorithms), MIQP (Mixed-integer Quadratic 
Programming), and MILP (Mixed-integer Linear Program- 
ming). Besides being inherently suboptimal in dealing with 
the combinatorial complexity, these methods are slow and, 
thus, not suitable for real time use in an embedded system or 
for centralized data processing for a large fleet of monitored 
devices (e.g., monitoring a fleet of engines). For example, 
U.S. Pat. No. 6,606,580 indicates that using GA optimization 
method for diagnostic estimation of faults in turbine engine 
required about an hour of computations. 

SUMMARY OF THE INVENTION 

The inventive methodology is directed to methods and 
systems that substantially obviate one or more of the above 
and other problems associated with conventional techniques 
for diagnostic and monitoring. 

An embodiment of the invention addresses the problem of 
integrated multivariable diagnostics by proposing a new 
method that includes optimization-based approach for diag- 
nostic estimation. Multivariable diagnostic estimation prob- 
lems are hard computationally and analytically because of 
nonlinearity and discrete nature of faults; this invention 
teaches a method for diagnostics estimation that overcomes 
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the difficulties. The proposed method is rigorous; it has supe- 
rior sensitivity, accuracy, conceptual simplicity, and compu- 
tational performance compared to the prior art multivariable 
diagnostic methods. 

It is an aspect of the present invention to provide a method 
for diagnostic estimation of fault states of an apparatus. The 
proposed fault diagnostics method is preferably implemented 
in software and can be adapted to work with different types of 
applications (faults, apparatuses, and systems) by changing 
initial data processing step of the method, fault signatures 
used in the method, and other configurable parts of the 
method. The method performs apparatus monitoring and for 
that purpose it is intended to execute repeatedly and periodi- 
cally after obtaining additional data from the sensors attached 
to the apparatus or its environment. The advantages of the 
proposed method that will become clear from the detailed 
description set forth below include accuracy, computational 
efficiency, conceptual clarity, implementation convenience, 
and maintainability after initial implementation has been 
completed. 

The proposed method computes diagnostic estimates of 
faults of an apparatus, which can be in either a no-fault con- 
dition or one or more fault conditions. The apparatus com- 
prises apparatus condition sensors connected to a computer 
processor, which implements the method. At each time (ex- 
ecution) period t, the method comprises the following four 
steps. Step 1 : processing data from the sensors to obtain a set 
of parameters y, known as parity parameters, which reflect 
apparatus condition deviation from normality. Step 2: collect- 
ing the parity parameters y over a moving horizon interval of 
time of a fixed maximal duration and ending at time period t 
in a data vector Y(t); Step 3: computing estimates of the fault 
conditions and likelihood parameters for each of the fault 
conditions; Step 4: transmitting the computed estimates of the 
fault conditions to a display device or to an automated deci- 
sion and control system or storing the computed estimates in 
memory. 

In Step 1 parameters y might be obtained from the sensor 
data as prediction residuals: differences of the observed sen- 
sor readings and the readings predicted for apparatus model 
which receives the same inputs as the apparatus. Different 
types of the apparatus model can be used such as a dynamic 
model, a nonlinear map, a set of static values corresponding to 
a chosen steady state regime, or another computer simulation 
model of the apparatus. 

The fault condition k at time period t, which is estimated at 
Step 3, is characterized by fault intensity parameter x^(t) and 
the fault signature corresponding to the fault condition k is 
known at the time of the method application. The fault sig- 
natures can be obtained as responses observed in the data y 
when a fault occurs or as approximations of such responses. 

The computation of diagnostic estimates for faults, which 
is performed at Step 3, comprises computing estimates of the 
fault intensity parameters x^(t) over the moving horizon inter- 
val of time and likelihood parameters p^ for each fault condi- 
tion k. Step 3 computations are done for one fault condition k 
at a time in two sub-steps: first by employing a ‘formulator’ 
and then by employing an ‘optimizer’. The formulator is a 
software module, which formulates a convex optimization 
program for fault condition using the moving horizon data 
vector Y(t). The optimizer is a software module, which 
numerically finds the solution of the convex optimization 
program encoded by the formulator; the solution is computed 
with a pre-defined accuracy for fault condition k. The convex 
program for the fault condition can include additional deci- 
sion variables in addition to the fault intensity parameters 

%(t)- 
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The convex optimization program encoded by the formu- 
lator, and solved by the optimizer can have one of the known 
forms, for which efficient optimization solvers are known, 
such as an isotonic or monotonic regression program, a 
5 univariate convex program, a Quadratic Program, a Linear 
Program, a Second-order Cone Program. These and many 
other types of constrained convex optimization programs can 
be efficiently solved using an interior-point method or other 
suitable convex optimization method. An important advan- 
1 0 tage of the proposed approach is that it could be set up to allow 
formulating and solving the convex optimization problem if 
one or more of the components of vector Y(t) is missing or 
unavailable. 

15 The diagnostic estimates for faults obtained at Step 3 and 
used at Step 4 comprise estimates of fault condition intensity 
parameters x*(t) over the moving horizon interval of time 
computed as the optimal solution and likelihood parameters 
p^ computed as the optimum value of the program. The fault 
20 condition parameters x*(t) and likelihood parameters p k com- 
puted by the optimizer can be used for determining one or 
several most likely candidate fault conditions (by sorting the 
likelihoods) and intensities of these conditions. This informa- 
tion can be employed for improving safety of the apparatus 
25 operation by halting or reconfiguring the operation in an event 
of the fault. The reconfiguration can be also used for improv- 
ing apparatus performance. Alternatively, knowing which 
fault might have most likely occurred can be used for sched- 
uling a correct maintenance action with reduced trouble- 
30 shooting effort. 

The proposed method can be implemented on-line in a 
computer or computers connected to the sensors of the appa- 
ratus or it can be implemented off-line by collecting data from 
the apparatus, transmitting it by electronic means to a com- 
35 puter implementing the method, and performing the method 
computations at a later time. The present invention also 
encompasses a system for diagnostics of an apparatus; the 
system implements the proposed method. The invention also 
encompasses a software program product comprising com- 
40 puter readable media implementing the proposed method. 

The proposed method, or a system implementing the 
method, or a software program product implementing the 
method are generic and with proper adjustment, tuning, con- 
figuration, and integration can be used in many different 
45 applications for many types of apparatuses. Three possible 
embodiments of the method described in detail below include 
monitoring solid rocket motor of launch vehicle for improv- 
ing safety of manned space flight, monitoring jet engines to 
improve aircraft servicing and maintenance, and monitoring 
50 semiconductor manufacturing tool to improve its perfor- 
mance. The proposed invention can be used in many addi- 
tional applications. These applications include but are not 
limited to heating, ventilation, and air conditioning equip- 
ment, chillers, and refrigerators; oil drilling rigs; various air- 
55 craft systems, including the propulsion system; ground 
vehicles (cars, tracks, and military vehicles) and their sys- 
tems; industrial manufacturing processes, such as refineries 
and pulp and paper plants; and other. 

Important advantages of the proposed method include but 
60 are not limited to the following: 

i. A convex program formulated by the formulator and 
solved by the optimizer is guaranteed to have a single global 
solution that can be efficiently computed. 

ii. A convex program can be solved very fast, especially for 
65 specialized forms of such problem that are described in 

detailed description below. In particular, such solution can be 
implemented in a real-time system. 
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iii. Encoding the problem as a convex program allows 
explicit implementation of constraints on the decision vari- 
ables. 

iv. The method can accommodate nonlinearities by 
employing models obtained by linearization at different con- 5 
ditions as separate fault models; an example is discussed in 
the detailed description. 

v. The method includes computation of the fault state like- 
lihood as a byproduct of the optimization; this enables com- 
puting fault ambiguity group by sorting the likelihoods and to 
thresholding them. 

vi. The method can accommodate missing data; this can be 
done by formulator dropping the terms with the missing data 
in the optimization problem 

Additional aspects related to the invention will be set forth 15 
in part in the description which follows, and in part will be 
obvious from the description, or may be learned by practice of 
the invention. Aspects of the invention may be realized and 
attained by means of the elements and combinations of vari- 
ous elements and aspects particularly pointed out in the fol- 20 
lowing detailed description and the appended claims. 

It is to be understood that both the foregoing and the fol- 
lowing descriptions are exemplary and explanatory only and 
are not intended to limit the claimed invention or application 
thereof in any manner whatsoever. 25 

BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings, which are incorporated in 
and constitute a part of this specification exemplify the 30 
embodiments of the present invention and, together with the 
description, serve to explain and illustrate principles of the 
inventive technique. Specifically: 

FIG. lisa block diagram which illustrates functionality of 
a representative monitoring and control system implementing 35 
the method proposed by this invention. 

FIG. 2 is a block diagram which depicts overall function- 
ality and component parts (steps) of the method proposed by 
this invention. 

FIG. 3 is a block diagram which depicts overall function- 40 
ality and component parts (steps) of the optimization based 
estimation algorithm in the center of the method proposed by 
this invention. 

FIG. 4 is a schematic picture illustrating the augmentation 
force and moment for the crew launch vehicle thrust augmen- 45 
tation fault. 

FIG. 5 is a block diagram which illustrates computation of 
angular and linear acceleration residuals as parity variables 
for the crew launch vehicle. 

FIG. 6 is a schematic picture which illustrates the dis- 50 
cretized fault modeling for the crew launch vehicle thrust 
augmentation. 

FIG. 7 is a chart which shows simulation results for the 
flight of crew launch vehicle with a case breach fault includ- 
ing angular and linear accelerations and the angular and linear 55 
acceleration residuals. 

FIG. 8 is a chart which shows negative log-likelihood plots 
for different fault hypothesis corresponding to the simulation 
results in FIG. 7. 

FIG. 9 is a block diagram which illustrates diagnostic 60 
estimation for a turbine engine. 

FIG. 10 is a chart which shows example results of diagnos- 
tic estimation for a turbine engine. 

FIG. 11 is a chart which illustrates fault signatures for 
etching tool. 65 

FIG. 12: is a chart which illustrates run to run data with 
faults for etching tool. 
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FIG. 13: is a chart which illustrates estimated fault inten- 
sity and seeded fault for etching tool. 

FIG. 14 is a block diagram that illustrates an embodiment 
of a computer/server system upon which an embodiment of 
the inventive methodology may be implemented. 

DETAILED DESCRIPTION 

In the following detailed description, reference will be 
made to the accompanying drawing(s), in which identical 
functional elements are designated with like numerals. The 
aforementioned accompanying drawings show by way of 
illustration, and not by way of limitation, specific embodi- 
ments and implementations consistent with principles of the 
present invention. These implementations are described in 
sufficient detail to enable those skilled in the art to practice the 
invention and it is to be understood that other implementa- 
tions may be utilized and that structural changes and/or sub- 
stitutions of various elements may be made without departing 
from the scope and spirit of present invention. The following 
detailed description is, therefore, not to be construed in a 
limited sense. Additionally, the various embodiments of the 
invention as described may be implemented in the form of 
software running on a general purpose computer, in the form 
of a specialized hardware, or combination of software and 
hardware. 

This invention claims a method for diagnostic estimation 
of fault states of an apparatus; the method can be imple- 
mented as a part of monitoring system or as a software pro- 
gram product. The proposed fault diagnostics method is pref- 
erably implemented in software and can be adapted to work 
with different types of applications (faults, apparatuses, and 
systems) by changing initial data processing step of the 
method, fault signatures used in the method, and other con- 
figurable parts of the method. The embodiments described 
below describe examples of the apparatuses for which this 
method can be implemented; the method is not limited to 
these example apparatuses. 

One preferred embodiment of the invention set forth below 
is aimed at enhancing safety of human space flight by moni- 
toring solid rocket motor (SRM) propulsion of a crew launch 
vehicle. The second preferred embodiment is aimed at 
improving the accuracy of maintenance actions in servicing 
aircraft jet engines. The third preferred embodiment is aimed 
at improving performance of a semiconductor manufacturing 
tool, such as an etch tool. FIG. 1, FIG. 2, and FIG. 3 each 
relate to all preferred embodiments described below. 

FIG. 1 illustrates a preferred embodiment with diagnostics 
system 10 receiving data from apparatus 20 and providing the 
fault state estimates to fault tolerance system 20 and decision 
support system 90. Apparatus 20 can be any engineering 
system: propulsion system, engine, ground, air, space, or 
water vehicle, machine, device, electrical power system, 
semiconductor manufacturing tool, HVAC equipment, com- 
puter network, etc. In one preferred embodiment, apparatus 
20 is propulsion system of a rocket launch vehicle (rocket 
motor) or of an aircraft (jet engine). In another preferred 
embodiment, apparatus 20 is a semiconductor manufacturing 
tool, such as an etch tool. The proposed invention is appli- 
cable to different types of system including but not limited to 
the systems described in detail below. Diagnostic system 10 
implements the proposed method and can be any suitable 
programmable computing device such as a general-purpose 
desktop computer, mainframe computer, server, avionics 
module, engine control unit, embedded processor, or FPGA 
device. 
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Apparatus 20 includes one or more sensors 30; it could also 
include one or more actuators 40. Embedded electronics 50 
interfaces with sensors 30 and actuators 40. Embedded elec- 
tronics 50 also interfaces with control system 60, which 
would be normally implemented in an embedded computer, 
and with diagnostic system 10. Some of the sensors 30 could 
be used for diagnostics only and not used by control system 
60. 

In some embodiments, the diagnosed fault state of the 
system or determined ‘no fault’ state is transmitted to fault 
tolerance system 80, which might perform system reconfigu- 
ration, such as switching to backup hardware, or implement- 
ing a mission management action, such as mission re-plan- 
ning. FIG. 1 shows fault tolerance system 80 interfaced with 
control system 60. Control system commands that influence 
apparatus 20 are also transmitted to diagnostic system 10. 

In some embodiments, the diagnosed fault state of the 
system is transmitted to decision support system 90, which 
processes the diagnostic estimates and prepares operator 
advisory information to be shown in operator display 70. The 
displayed advisory can be used for on-line decision, e.g., by 
an aircraft pilot or a tool operator, or off-line, e.g., by main- 
tenance personnel for deciding which of the plurality of pos- 
sible maintenance and troubleshooting actions should be 
undertaken. 

FIG. 2 illustrates the preferred embodiment of the diagnos- 
tics system 10 in more detail. The system is engaged periodi- 
cally at a time period that is known. FIG. 2 illustrates perfor- 
mance of diagnostic system 10 at time period with a 
sequential number t. Apparatus data 100, which serve as the 
input, include the data obtained by diagnostics system 10 
from embedded electronics 50 of the apparatus and from 
control system 60. Apparatus data 100 are used to compute 
parity variables 110. 

Computation of parity variables is a part of the proposed 
method that might differ substantially between different types 
of apparatuses and between different apparatuses of the same 
type, e.g., between different jet engines. The computation is 
based on a model of apparatus; different types of models 
might be used including static or dynamic physics-based 
models, neural network maps, or process recipes. Parity vari- 
ables are herein defined as such transformations of the appa- 
ratus data, which are supposed to be zero according to the 
model of the apparatus nominal operation. In the first pre- 
ferred embodiment (the SRM case breach detection), the 
parity variables are the differences between the predicted and 
actually measured accelerations of the rocket. In the second 
preferred embodiment (the turbine engine diagnostics), the 
parity variables comprise the differences between the actual 
engine output data and the data predicted based on the engine 
model, which has the same inputs as the actual engine; the 
parity variables further comprise discrete fault flags com- 
puted by lower-level fault detection logic. In the third pre- 
ferred embodiment (the semiconductor manufacturing tool), 
the parity variables are the deviations of the measured vari- 
ables from the set values in the recipe. 

The parity variables are collected in a moving horizon data 
set Y (t) 125, which includes the parity variables obtained over 
the horizon of length of last n time periods of the diagnostic 
estimation. In one embodiment, the data computed at the 
previous cycles are stored in the memory 120 and included 
into the moving horizon data set 125. An important and useful 
feature of the proposed invention is that is can be used even if 
a part of the data in the set 125 is missing; the data can be 
missing because the sensors readings are lost or unavailable 
to diagnostics system or because some sensors fail. As a 
special case, the horizon length can be n=l, in which case 
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only the most recent data are used for diagnostics. The mov- 
ing horizon length n can be determined as a part of system 
engineering tradeoffs; the proposed invention does can be 
implemented with different lengths n of the moving horizon. 
5 Data set Y (t) 125 serves as the main input into estimation of 

fault condition and likelihoods 130; the said estimation is 
detailed in FIG. 3 and the description of FIG. 3 below. As 
mentioned above, the invention assumes that there are K 
different fault states. The proposed method determines which 
10 fault state actually exists and what fault intensity is. Estima- 
tion 130 produces an output data set 150 that includes fault 
intensity parameter vectors X k and likelihood values p^ for 
each of the K possible fault states. Sorter 140 ranks the 
15 estimates in the order of decreasing likelihood and selects a 
short list of the faults that are most likely to have occurred. 
The short list is also known as an ambiguity group. If the 
apparatus is determined to be in the ‘no-fault’ state, the ambi- 
guity group contains no (zero) faults states. In the preferred 
20 embodiment the short list includes data for the faults with the 
likelihoods above a given threshold. 

FIG. 3 illustrates the preferred embodiment of the method 
for batch estimation 130 from the moving horizon data Y(t) 
1 25 at a given time period t of operation of diagnostics system 
25 10. Faults signatures 210 are shown as an additional input to 
the estimation. In the first preferred embodiment (the SRM 
case breach detection), fault signatures 210 can be computed 
based on fault number using a simple formula. In the second 
preferred embodiment (the turbine engine diagnostics), fault 
30 signatures 210 are computed using a model of the engine. In 
the third preferred embodiment (the semiconductor manufac- 
turing tool), fault signatures 210 are computed off-line from 
the historical fault data and are stored in a table. 

35 The estimation computations 230 start by performing ini- 
tialization 200. The initialization resets the fault state number 
k to k=l . In one embodiment, the initialization includes com- 
puting likelihood of the ‘no-fault’ hypothesis. The computa- 
tions are performed in cycle 230 for one fault condition at a 
40 time. For a given fault condition k, fault signature S k 245 is 
obtained by selector 285 from fault signatures 210. Data Y(t) 
125 and fault signature 245 are provided as inputs to 
formulator 240. 

An output of formulator 240 is a convex program for the 
45 optimal estimation of the fault intensity parameter vector X k 
and vector of additional decision variables U. In one embodi- 
ment, U includes the hidden states of the system that need to 
be determined along with X. 

Formulator 240 sets up the structure and parameters of the 
50 program such as matrices, vectors, and parameters of the 
optimized loss function and constraints. In the preferred 
embodiment, the loss function for fault k denoted as L k is the 
negative log-likelihood index for a posteriori probability of 
55 fault k. In accordance with the Bayes formula, the loss index 
is 

-log P(X h U\ Y)=- log P(Y\X h U )- log P(X h U)+c (1) 

where Y=Y(t), the conditional probability P(YIX^,U) is 
60 known as observation model, and the conditional probability 
P(X fr ,U) is known as prior model. The constant c can be used 
for normalizing the log posterior index (1) to be the actual 
likelihood of the fault (in that case c=log P(Y)) or likelihood 
ratio relative to the ‘no-fault’ hypothesis (in that case c=log 
65 P(YI0)+log P(0) with 0 denoting the ‘no-fault’ hypothesis). 
The loss index L^(X^, U)=-log P(X^,UIY) is minimal when 
the posterior probability P(X^,UIY) is maximal; here U is a 
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vector of additional parameters present in the problem. The 
problem of fault estimation can be stated as 

minimize L k (X k ,U) (2) 

subject to {X h U}e (3). 5 

The specific form of loss index L^(X^,U) depends on the 
fault stochastic model and has significant degree of flexibility. 
The proposed invention requires the loss index L^pC^U), 
which is a convex function. The loss index is defined in a 
convex domain which reflects the known constraints on 10 
the decision variables X k and U. 

The invention is not limited to one particular type of the 
loss index (2) and one particular set of constraints (3). It 
includes the preferred embodiments of (2)-(3) that are dis- 
cussed below; it also includes different combinations and 15 
extensions of these embodiments that yield convex problems 
of the form (2)- (3) as anyone versed in the art would recog- 
nize. A specific choice of the optimization problem formula- 
tion and of the parameters of such formulation is a matter of 
detailed system engineering, such detail are outside of what is 20 
claimed by this invention. A detailed formulation can be taken 
from the published literature or established for the system in 
hand in a custom way. 

Data set Y(t) 125 obtained by moving horizon data collec- 
tion can be mathematically presented in the form 

Y(t)=col[y{t-n) . . . y{t)\ (4) 

where y(t) is the parity vector, n is the horizon length 
parameter, and t is the time step (sample) number. In one 
embodiment, the fault intensity parameter vector X k can be 30 
represented in the form 

X k =colfx(t-n) . . .*(*)], (5) 

where x(t) is a scalar parameter defining fault intensity. The 
probabilistic observation model P(YIX Ar ) in (2) is expressed as 35 

y(t)=SkX(t)+w(t), ( 6 ) 

where is the fault signature vector and w(t) is indepen- 
dent identically distributed observation noise sequence. The 
probabilistic prior model P(X A: ) in (2) is expressed as 40 

x(r+l)=x(r)+v(r), (7) 

where v(t) is independent identically distributed process 
noise sequence. In the first preferred embodiment, assuming 
that w(t) follows a Gaussian distribution and v(t) follows a 45 
one-sided exponential distribution. Such formulation models 
monotonic irreversible accumulation of the damage x(t) and 
yields convex problem (2)-(3) of the following specific form 

minimize 1 /2Q2 f [y(t)-Spc(t)] 2 +R2^[x(t)-x(t-l)], (8) 

50 

subject to x(r-l )-*(?) =0, (9) 

where Q is the covariance of the Gaussian noise w(t) and R 
is the first momentum of the exponential noise v(t) indepen- 
dent identically. The problem (2)-(3) is an isotonic regression 
problem for which very fast specialized solution methods are 55 
available. 

Another embodiment of the probabilistic formulation uses 
a second-order dynamic model instead of the first-order 
model (6)-(7) 


X0 = ‘S'**2(0+w ; (4 

(10) 

*i(r+l)^i(r)+vi(r), 

(11) 

x 2 (M-1)=*i (t)+x 2 (t)+v 2 (t), 

(12) 


where x : (t) can have a meaning of primary damage and 
x 2 (t) has a meaning of secondary damage, which is observ- 
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able through y(t); the rate of the secondary damage accumu- 
lation is proportional to the primary damage intensity at that 
time. In this embodiment, the values x 2 (t) in the model (10)- 

(12) contribute to the vector X k (5) and the values x x (t) are 
included into the vector U of additional decision parameters 
in the problem (2)- (3). 

Other embodiments can use different formulations of the 
convex estimation problem obtained by assuming that v(t) 
and w(t) are either gaussian, or uniform bounded, or Lapla- 
cian, or one-sided exponential noises in the first-order model 
(6)-(7), the second-order model (8)-(10), or in other linear 
state-space models with additive noises that might be used to 
formulate the probabilistic models in (1). 

The second preferred embodiment considers a probabilis- 
tic formulation where some of the observed parity variables 
y(t) are discrete variables taking values 0 or 1 only. Such 
discrete variables correspond to fault warning flags known as 
BIT (Built-in-test) flags generated by low-level electronics 
hardware or software in a majority of embedded control and 
monitoring systems. One embodiment uses probabilistic for- 
mulation with discrete variables that has the form. 

X0=ei^(0+wY)L (13) 

where 0 X is the component- wise Heaviside function with a 
unit threshold and other variables have the same meaning as 
discussed above. A modification of the model (13) is the 
discrete model 

yW-QrfSxW+wm, (14) 

where S is a matrix, x(t) and w(t) are vectors and the 
absolute values are computed component wise. Formulation 
(1 4) models sensor failure that is manifested as a sensor offset 
and also might cause a BIT fault flag to be set. Formulations 

(13) and (14) yield convex terms -log P(YIXj in the negative 
log-likelihood index (1). In particular for formulation (13) 

-log P(y=l be)— log(2-4>[(l-5*xy4/-®[(l-hS*x)/g]), (15) 

-log P(r=0lx)=-log(l + ^[(l-^)/ ? 7 + O[(l+5^]), (1 6) 

where O is a cumulative probability density function for 
the normal distribution with unit covariance, q is the standard 
deviation of the zero-mean Gaussian noise w(t) and functions 
of the vectors are computed component-wise. 

The embodiments discussed in some detail above and other 
possible embodiments of this invention formulate the fault 
state estimation problem as convex optimization problem 
(2)-(3). The convexity of problem (2)-(3) provides a guaran- 
tee that a global optimal solution X k of the problem (2)- (3) 
can be efficiently computed using an optimizer 250. One 
embodiment employs optimizer using an interior-point 
method. Another embodiment implements the method by 
using commercially available convex optimizers/solvers such 
as Mosek or SeDuMi. There are existing optimizers that solve 
the convex problems of the well-known classes such as QP, 
Linear Program (LP), Second-Order Conic Program (SOCP), 
and other such. For on-line implementations, approaches to 
designing an interior-point convex optimizer are known to 
one versed in the art and published in numerous books on the 
subject. An important advantage of the proposed invention is 
that it can use optimizer 250 designed and implemented as a 
separate function. Some convex problem embodiments dis- 
cussed below allow for special case solutions that are simple 
and faster than solutions obtained using a generic convex 
optimizer; one such solution for isotonic regression is men- 
tioned below. 

In the preferred embodiment illustrated in FIG. 3, opti- 
mizer 250 is shown to provide the fault state estimate X and 
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the likelihood p as the outputs 260. The vector X is obtained 
as an optimal decision vector in the optimization problem 
(2) -(3) or is made of some components of the optimal deci- 
sion vector. The likelihood value p is obtained as an optimal 
value of the log-likelihood index in the optimization problem 
(2) -(3) or as a simple transformation of such optimal value. In 
one embodiment, the log-likelihood ratio is converted to 
probabilistic likelihood by taking an exponent to convert 
from log-probabilities to probabilities and then multiplying 
by a scaling factor to normalize the probabilities of the 
complementary events such that they add up to one. The 
obtained estimates X and p 260 are added to the set of the 
accumulated estimates 275 and the computations 230 are 
repeated for the next fault 280 until all the faults are exhausted 
290. After the completion of the computations 295, accumu- 
lated estimates 275 are included with the outputs of the diag- 
nostic system 10. 

SRM Case Breach Detection for CLV 

The first preferred embodiment of the invention is dis- 
cussed below in regard to early detection of Solid Rocket 
Motor (SRM) case breach detection for crew launch abort in 
a crew launch vehicle. Ares I crew launch vehicle (CLV), 
which is being developed by NASA, with a case breach fault 
in the vehicle SRM is illustrated in FIG. 4. The SRM serves as 
the first stage of the vehicle. Ares I CLV is the first ever 
human-rated vehicle with the first stage using exclusively a 
solid propellant rocket. A timely detection of incipient loss of 
the launch vehicle control could allow the crew to escape by 
ejecting the crew capsule. One of the most important failure 
modes leading to loss of vehicle control is caused by hot gases 
escaping from the SRM combustion chamber through a case 
breach; the breach could occur through a joint between SRM 
segments, through a nozzle joint, or through a igniter seal. 
The hot gases escaping through the breach create lateral thrust 
augmentation, with the resulting tilting moment possibly 
leading to loss of control. By using Thrust Vector Control 
(TVC) gimballing of the main nozzle, the flight control sys- 
tem would counters attitude disturbances; this could mask the 
moment augmentation caused by the case breach fault till it 
overwhelms the TVC and the control is lost. 

FIG. 4 illustrates the case breach fault 305 in the first stage 
SRM 300 of a CLV; T^ 310 is the thrust augmentation force, 
d 320 is the distance between the case breach location and the 
rocket center-of-gravity (CG), M z 315 is the tilting moment 
created by the force T A 310 that is aligned with the lateral axis 
y. The attached axis is longitudinal and axis z complements y 
and x. In the first preferred embodiment, the method of the 
proposed invention is aimed at accurate and fast detection of 
the case breach fault using Guidance Navigation and Control 
(GN&C) sensors available on board of the rocket. In one 
embodiment the diagnostic system is implemented in the 
vehicle on-board avionics and performs automated decision 
on initialing crew abort; if sufficient time is available the 
diagnostic estimated could be presented to the pilot display 
for the pilot to make the decision to abort the mission and 
eject crew capsule 325. In another embodiment, the diagnos- 
tic system transmits the data to the ground mission manage- 
ment operators who would make the decision. Both embodi- 
ment implement the proposed invention; the applicability is 
defined by acceptable decision delay. The problem of case 
breach fault estimation is highly nonlinear; the location of the 
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breach and the breach intensity are unknown. The proposed 
invention allows addressing this problem. 

FIG. 5 illustrates computation of parity variables 350 in 
this embodiment, which is the first step in applying the pro- 
5 posed method. The vehicle data used for the parity variable 
computations are comprised of 6 accelerations (3 rotational 
and 3 linear), 3 linear velocities, 3 angular rates, 3 attitude 
angles, altitude (for characterizing air flow), 2 TVC gimbal 
angles, and flight time (for scheduling). Instantaneous values 
10 of linear and angular accelerations are measured by acceler- 
ometers in an IMU (inertial measurement unit) of the vehicle. 
The flow of the computations is illustrated in FIG. 5. One 
shown input to the computations is the vector of six accelera- 
15 tions 355. Another shown input 360 collects the remaining 
sensor channels describing the dynamical states of the vehicle 
rigid-body model. The flight dynamics model is used to com- 
pute the total moments and forces acting on the vehicle in the 
attached coordinate axes 365. Knowing the vehicle inertia 
20 tensor and the vehicle mass, the three angular and three linear 
accelerations of the vehicle in the attached axes are calculated 
from Euler’s equations and Newton’s law. The computed 
accelerations are subtracted 370 from the measured accelera- 
tions 355 to yield the 6-component acceleration residual vec- 
25 tor y(t) 375. In this embodiment the parity variables are the 
acceleration residuals. Based on the model they should be 
zero if there is no fault. In fact, they are impacted by sensor 
noise, thrust variation, airflow turbulence, and modeling 
errors. To obtain the moving horizon data set 125, the accel- 
30 eration residuals y(t) 375 are collected over the horizon of 
length n as described by (4). 

FIG. 6 illustrates the fault model in this embodiment. The 
thrust augmentation is specified by three parameters: magni- 
35 tude x(t) 380, circumferential angle of the case breach loca- 
tion, (3^ 390, and longitudinal coordinate of the breach loca- 
tion d A 385. Though, in fact, the case breach can be located in 
a continuum of possible locations, the preferred embodiment 
of this invention assumes that it is located in one of K discrete 
40 locations 395 illustrated in FIG. 6. Fault signature models for 
each of the discrete locations are computed based on the 
geometry as follows. In this embodiment each fault k corre- 
sponds to hypothesis H k that the breach location is at the 
known point described by circumferential angle, 385 and 
45 longitudinal coordinate d A 390 

H k :{£ A =V> A ,k;dA=d A , k }(k= 1, . . . A) (17). 

For given fault hypothesis and assumed breach location, 

the thrust augmentation magnitude T^=x(t) 380 is an time- 
50 dependent unknown variable. A MAP estimate of x(t) from 
the data is obtained used within H^. The null (‘no -fault’) 
hypothesis H 0 assumes that there is no fault and x(t)sO. Diag- 
nostic system 10 must determine which of the fault hypoth- 
esis holds: whether the fault has occurred and, if yes, what is 
55 the fault location index k. For the fault hypothesi s H k the fault 
data vector X k has form (6). 

For each candidate fault k, formulator 240 formulates a 
convex problem of optimizing the negative log-likelihood 
index in the form (2)-(3). Probabilistic model ( 1 ) for the index 
60 comprises the prior monotonic random walk model of the 
form (7), where v(t) is one-sided exponentially distributed 
noise. Such model reflects the prior knowledge that the 
breach grows irreversibly. Probabilistic model (1) further 
65 comprises observation model of the form (6) where the fault 
signature can be calculated as the effect of the thrust augmen- 
tation force 385 in FIG. 6 on the acceleration residuals 375. 
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where m is the rocket mass, \ zz and I are the main 
moments of inertia, and p A is a nondimensional coefficient 
describing the longitudinal thrust decrease in proportion to 
the lateral thrust augmentation. (The main thrust decreases 
because the combustion products escaping through the 
breach do not participate creating the main longitudinal thrust 
of the SRM). 

The probabilistic model (6)-(7), (18) yields the convex 
optimization problem formulation (8)-(9), which is an iso- 
tonic regression problem. This problem can be efficiently and 
very quickly solved by a convex optimizer 250 implementing 
one of the known linear-time isotonic regression solutions, 
such as the PAVA algorithm. 

The described embodiment of the invention was imple- 
mented and validated in a detailed simulation of Ares 1 CLV 
flight. FIG. 6 illustrates simulation results. The simulation 
included 6-DOF (degree-of- freedom) kinematics and dynam- 
ics, TVC actuator model, aerodynamic tables obtained from 
CFD analysis, SRM thrust augmentation, and a 10% random 
variation of the SRM thrust, which creates acceleration) itter. 
In simulation, a case breach fault was introduced as a ramp 
starting at simulation time 1 0 sec and reaching a steady state 
value at time 1 6 sec. FIG. 7 shows the six accelerations 355 as 
dashed lines 400, 405, 410 (angular accelerations), and 415, 
420, 425 (linear accelerations). Acceleration residuals 375 
are shown as solid lines 430, 435, 440 (the angular) and 445, 
450, 455 (the linear). 

In the simulation, the estimation algorithm worked with the 
acceleration residual data sampled at 200 ms period over the 
moving horizon of n=50 samples. The proposed method cor- 
rectly estimated the magnitude and location of the seeded 
fault with a 1.5 sec delay after the start of the ramp. The 
proposed method provided for a superior quality of estima- 
tion. The correct location and intensity of the case breach 
were determined reliably and fast despite the substantial 
noise contamination of the data. 

FIG. 8 illustrates the estimation results obtained with the 
proposed method by showing the traces of the loss indexes 
computed at each time step. The solid line 460 shows the 
point-wise minimum for all fault hypotheses, min ^I , v The 
upper dashed curve 465 shows the loss index L 0 for the 
4 no -fault’ hypothesis. The curve with the triangular markers 
470 shows the 4 no -fault’ hypothesis index L 0 which is offset 
by AL=log(l-P 1 )/P 1 ; where P x is the probability of the case 
breach fault and (1-Pj is the complementary probability of 
the 4 no-fault’ state. The fault is detected when the solid line 
460 crosses the triangle marked line 470. The algorithm had 
very good computational performance with the computation 
time of a few milliseconds on a PC computer. 

Turbine Engine Diagnostics 

The second preferred embodiment of the proposed inven- 
tion is discussed with respect to a diagnostic system for tur- 
bine propulsion engine for aircraft. FIG. 9 illustrates the 
process of diagnostic estimation for a turbine engine with the 
proposed method. Turbine engine 500 is equipped with elec- 
tronic control unit 510, which is interfaced with the engine 
sensors. FIG. 9 shows that the functionality of electronic 
control unit 510 includes estimation of engine efficiency 
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parameters 515; this estimation yields deviations of the 
engine efficiency parameters, such as compressor efficiency, 
turbine efficiency, etc, from the nominal values. The outputs 
u(t) of parameter estimation 515 are considered as parity 
5 parameters 525. Electronic control unit 510 further includes 
BIT functions and BIT processing logic shown as a block 520 
that estimates discrete fault flags and generates fault codes 
z(t) 530. The codes z(t) are zero (or absent) in the absence of 
faults and can be considered as discrete parity parameters, 
to In the second preferred embodiment the moving horizon 
has length one, only the data from the last time period are 
considered. Data vector 125 combines the continuous 
and discrete parity parameters. 
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Y(r) = y(t) = 


' u(t ) ' 

. Z{t ) . ' 


(18) 


20 The diagnostic system produces the estimates 540 of the 
fault condition intensities X A1 , . . . and likelihood param- 
eters p£ 1? . . . pj^ for each of the candidate fault conditions. The 
estimates 540 are obtained by module 535 as the maximum 
posteriori probability estimates (1). The optimization pro- 
25 gram 250 (2)-(3) is formulated 230 by using observation 
models of the form (6) for the continuous components of the 
parity vector y(t) (18) and observation models of the form 
(13) for the discrete components of vector y(t) (18). Fault 
signatures 210 are available from a detailed simulation model 
30 of the engine. In this embodiment Gaussian priors are used for 
the fault intensity: x~x^N(m^,Rj, where index k numbers 
components of vector Y(t) (18) and fault types. 

For each potential fault k, an optimization problem (2) is 
formulated by formulator 230; there are no constraints (3). 
35 The log-likelihood optimization problem solved by optimizer 
250 has a single decision parameter: fault intensity x k . The 
problem is convex and the optimal solution is obtained by 
optimizer using a dichotomy algorithm to find the zero of the 
gradient of the loss index. 

40 In this embodiment, the computed estimates 150 of the 
turbine engine fault conditions are transmitted to a display 
device 70 through a decision support system 90, or to an 
automated decision and control system 80, or stored in 
memory for subsequent transmission and analysis. The diag- 
45 nostic estimates can be performed by software implemented 
in on-board avionics attached to the engine. The estimates can 
be used during the flight as a pilot warning and/or stored till 
the end of the flight. The diagnostic estimates can be also 
performed in on-ground computers that receive sensor data 
50 snapshots obtained during the flight. The ground processing 
would indicate a possibility of a particular fault and provide 
maintenance guidance. 

FIG. 10 illustrates results for the method implemented for 
a detailed simulation of a military engine. In this example 
55 fault #1 was seeded with magnitude x 1 =0.1. The following 
diagnostic report was obtained for data in FIG. 10. 


60 

Fault 

Likelihood 

Intensity 


FAULT #1 

p = 0.587 

x = 0.109 


FAULT #2 

p = 0.413 

x = 0.052 


NO FAULT 

p = 0.000 

x = 0 


65 This report shows that the fault # 1 was correctly identified. 
It also shows that fault #2 has a close but smaller likelihood. 
Depending on the chosen threshold this second fault can be 
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included into the ambiguity group. The ambiguity is caused 
by large noise. The noise distortion of the data is illustrated by 
the difference between the last two rows in the table of FIG. 

10 . 

Predictive Maintenance of Semiconductor Manufacturing 
Tool 

The third preferred embodiment is discussed below with 
respect to a semiconductor manufacturing tool. Though the 
specific discussion is for an etch tool, one versed in the art 
would recognize its full applicability to other tools, including 
but not limited to a CVD (chemical vapor deposition) tool, 
CMP (chemical -mechanical polishing), ion beam implanta- 
tion, lithography tools, and other tools used in semiconductor 
manufacturing. 

Semiconductor manufacturing tools are used in the pro- 
cesses of manufacturing IC (integrated circuits). Such tools 
are complex machines equipped with multiple sensors 30, 
actuators 40, and embedded electronics 50. Control system 
60 of such tool is operating at two time scales. At the fast time 
scale is the tool is controlled and monitored while processing 
a single batch of silicon wafers. At a slower time scale, the 
tool is controlled and monitored from run to run; this is known 
as R2R control. In the third preferred embodiment, diagnos- 
tics system 10 processes the data obtained over multiple runs. 
In this embodiment, the time period t is the sequential run 
number. The diagnostic estimates are transmitted into a deci- 
sion support system 90 that provides maintenance recom- 
mendations for the tool through operator display 70. 

The semiconductor manufacturing tools usually imple- 
ment a fixed recipe for long periods of time. In this embodi- 
ment, tool R2R data 100 are used to compute parity variables 
110 as deviations of the monitored parameters from the 
recipe. 

An example of the third preferred embodiment is discussed 
below is its implementation for an etch tool. An engineering 
design of the diagnostic and monitoring system requires a 
selection of the monitored variables and selection of the fault 
parameters to be estimated. Such selection is driven by fault 
frequency and impact as well as by ability to estimate the 
faults from the monitored variables. The selection is outside 
of the method proposed by this invention. For the etch tool 
example we consider the following four monitored variables: 
‘REFLECTED RF POWER’, ‘ELECTRODE COOLING 
TEMPERATURE’, ‘PRESSURE CONTROL VALVE’, and 
‘ENDPOINT DETECTOR’. The deviations of these vari- 
ables from the recipe at run t comprise the observation vector 
y(t). The proposed diagnostic estimation method monitors 
(and estimated the intensity of) the following three potential 
faults: ‘HIGH REFLECTED RF POWER’, ‘LOWER ELEC- 
TRODE TEMPERATURE’, and ‘CHAMBER LEAK’. The 
intensities of these faults correspond to deviations of the 
respective parameters from their recipe values and comprise 
the fault vector x(t). 

FIG. 11 illustrates the fault signatures $> k relating y(t) and 
x(t) in accordance with (6). The upper plot 600 in FIG. 11 
illustrates the fault signature S x for ‘HIGH REFLECTED RF 
POWER’ fault. The middle plot 605 in FIG. 11 illustrates the 
fault signature S 2 for ‘LOWER ELECTRODE TEMPERA- 
TURE’ fault. The lower plot 610 in FIG. 11 illustrates the 
fault signature S 2 for ‘LOWER ELECTRODE TEMPERA- 
TURE’ fault. The bar height illustrates the value of the sig- 
nature vector component. The bar numbers correspond to the 
monitored variables: 620 bar 1 to ‘REFLECTED RF 
POWER’, 625 bar 2 to ‘ELECTRODE COOLING TEM- 
PERATURE’, 630 bar 3 to ‘PRESSURE CONTROL 
VALVE’, and 635 bar 4 to ‘ENDPOINT DETECTOR’. 
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In this third preferred embodiment example the moving 
horizon data are collected over a horizon 120 of n=200 runs. 
Such and larger horizons are possible since the diagnostic 
estimation processing does not need to be faster than the run 
5 duration. The moving horizon data Y(t) 1 25 of the form (4) i s 
provided as one of the inputs to formulator 230. Another input 
is the fault signature S^ illustrated in FIG. 11; a different 
signature at each cycle k 200, 280, 290, 285, 295. 

The optimization problem formulated by formulator 230 is 
to based on the trend model of the form (10)- (12). This 
model assumes that process noise v 2 (t)=0 and process 
noise v^t) is Laplacian with covariance R. The model 
leads to a MAP negative log-likelihood index L=L(X /c , 
U), where X jt =col[x 2 (t-n) . . . x 2 (t)] and U=col[x 1 
1 5 (t-n) . . . x ! (t)] . The formulated index optimization prob - 

lem has the form 

minimize ViL t [\ ly(?) (0 1 1 gl 2 ,/V (t) -x l (t- 1 )] , (19) 

subject to x 2 {t)=x i {t-l)+x 2 {t-l) (20). 

The problem (19)-(20) is a QP program and can be solved 
by using a standard QP solver in optimizer 250. 

FIG. 12 depicts four data plots as a function of run number, 
including reflected RF power 640, lower electrode tempera- 
25 ture 645, pressure control valve 650 and endpoint detector 
655. Specifically, FIG. 12 illustrates example R2R data for 
etch tool; the tool experiences a fault which is reflected in the 
data but is difficult to detect without data processing. The data 
in FIG. 12 are normalized nondimensional variables. These 
30 data were obtained in a simulation and reflect a ‘LOWER 
ELECTRODE TEMPERATURE’ fault combined with ran- 
dom noise and a structured noise variation. Using the pro- 
posed method for fault estimation and computing the sorted 
fault intensities and likelihoods 150 yields the correct result. 
35 Fault with index k=2 was correctly determined to have the 
highest likelihood p*. This ‘LOWER ELECTRODE TEM- 
PERATURE’ fault was determined to be about 20% more 
likely than any other fault hypothesis or ‘NO FAULT’ hypoth- 
esis. 

40 The computed decision vector X 2 comprised of the fault 
estimate time series x 2 (t) 665 is illustrated in FIG. 13 in 
comparison with the seeded fault time series 660. FIG. 13 
shows that the seeded fault 660 at the last time period and the 
seeded fault time series 665 over previous time periods t are 
45 estimated with good accuracy from the noisy data. 

Other Applications of the Proposed Method 

The proposed invention can be used in many other appli- 
cations outside of the presented preferred embodiments. 
These applications include but are not limited to HVAC (heat- 
50 ing, ventilation, and air conditioning equipment) such as air 
conditioners, heaters, chillers, and refrigerators; oil drilling 
rigs, aircraft systems; ground vehicles (cars, tracks, and mili- 
tary vehicles) and their systems; industrial manufacturing 
processes, such as refineries and pulp and paper plants; and 
55 other. 

Exemplary Computer Platform 

FIG. 14 is a block diagram that illustrates an embodiment 
of a computer/server system 1400 upon which an embodi- 
ment of the inventive methodology may be implemented. The 
60 system 1400 includes a computer/server platform 1401, 
peripheral devices 1402 and network resources 1403. Perifi- 
eral devices 1402 may be absent if computer system 1400 is 
implemented as an embedded system, e.g., as an embedded 
control and monitoring system which is integrated with the 
65 apparatus. 

The computer platform 1401 may include a data bus 1404 
or other communication mechanism for communicating 
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information across and among various parts of the computer 
platform 1401, and a processor 1405 coupled with bus 1404 
for processing information and performing other computa- 
tional and control tasks. Computer platform 1401 also 
includes a volatile storage 1406, such as a random access 
memory (RAM) or other dynamic storage device, coupled to 
bus 1404 for storing various information as well as instruc- 
tions to be executed by processor 1405. The volatile storage 
1406 also may be used for storing temporary variables or 
other intermediate information during execution of instruc- 
tions by processor 1405. Computer platform 1401 may fur- 
ther include a read only memory (ROM or EPROM) 1407 or 
other static storage device coupled to bus 1404 for storing 
static information and instructions for processor 1405, such 
as basic input-output system (BIOS), as well as various sys- 
tem configuration parameters. A persistent storage device 
1408, such as a magnetic disk, optical disk, or solid-state flash 
memory device is provided and coupled to bus 1404 for 
storing information and instructions. 

Computer platform 1401 may be coupled via bus 1404 to a 
display 1409, such as a cathode ray tube (CRT), plasma 
display, or a liquid crystal display (LCD), for displaying 
information to a system administrator or user of the computer 
platform 1401. An input device 1410, including alphanu- 
meric and other keys, is coupled to bust 1404 for communi- 
cating information and command selections to processor 
1405. Another type of user input device is cursor control 
device 1411, such as a mouse, a trackball, or cursor direction 
keys for communicating direction information and command 
selections to processor 1404 and for controlling cursor move- 
ment on display 1409. This input device typically has two 
degrees of freedom in two axes, a first axis (e.g., x) and a 
second axis (e.g., y), that allows the device to specify posi- 
tions in a plane. 

An external storage device 1412 may be connected to the 
computer platform 1401 via bus 1404 to provide an extra or 
removable storage capacity for the computer platform 1401. 
In an embodiment of the computer system 1400, the external 
removable storage device 1412 may be used to facilitate 
exchange of data with other computer systems. 

The invention is related to the use of computer system 1400 
for implementing the techniques described herein. In an 
embodiment, the inventive system may reside on a machine 
such as computer platform 1401. According to one embodi- 
ment of the invention, the techniques described herein are 
performed by computer system 1400 in response to processor 
1405 executing one or more sequences of one or more instruc- 
tions contained in the volatile memory 1406. Such instruc- 
tions may be read into volatile memory 1406 from another 
computer-readable medium, such as persistent storage device 
1408. Execution of the sequences of instructions contained in 
the volatile memory 1406 causes processor 1405 to perform 
the process steps described herein. In alternative embodi- 
ments, hard-wired circuitry may be used in place of or in 
combination with software instructions to implement the 
invention. Thus, embodiments of the invention are not limited 
to any specific combination of hardware circuitry and soft- 
ware. 

The term “computer-readable medium” as used herein 
refers to any medium that participates in providing instruc- 
tions to processor 1405 for execution. The computer-readable 
medium is just one example of a machine-readable medium, 
which may carry instructions for implementing any of the 
methods and/or techniques described herein. Such a medium 
may take many forms, including but not limited to, non- 
volatile media or volatile media. Non-volatile media 
includes, for example, optical or magnetic disks, such as 
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storage device 1408. Volatile media includes dynamic 
memory, such as volatile storage 1406. 

Common forms of computer-readable media include, for 
example, a floppy disk, a flexible disk, hard disk, magnetic 
5 tape, or any other magnetic medium, a CD-ROM, any other 
optical medium, punchcards, papertape, any other physical 
medium with patterns of holes, a RAM, a PROM, an EPROM, 
a FLASH-EPROM, a flash drive, a memory card, any other 
memory chip or cartridge, a carrier wave as described here- 
1 0 inafter, or any other medium from which a computer can read. 

Various forms of computer readable media may be 
involved in carrying one or more sequences of one or more 
instructions to processor 1405 for execution. For example, the 
15 instructions may initially be carried on a magnetic disk from 
a remote computer. Alternatively, a remote computer can load 
the instructions into its dynamic memory and send the 
instructions over a telephone line using a modem. A modem 
local to computer system 1400 can receive the data on the 
20 telephone line and use an infra-red transmitter to convert the 
data to an infra-red signal. An infra-red detector can receive 
the data carried in the infra-red signal and appropriate cir- 
cuitry can place the data on the data bus 1404. The bus 1404 
carries the data to the volatile storage 1406, from which 
25 processor 1405 retrieves and executes the instructions. The 
instructions received by the volatile memory 1406 may 
optionally be stored on persistent storage device 1408 either 
before or after execution by processor 1405. The instructions 
may also be downloaded into the computer platform 1401 via 
30 Internet using a variety of network data communication pro- 
tocols well known in the art. 

The computer platform 1401 also includes a communica- 
tion interface, such as network interface card 1413 coupled to 
the data bus 1404. Communication interface 1413 provides a 
35 two-way data communication coupling to a network link 
1414 that is connected to a local network 1415. For example, 
communication interface 1413 may be an integrated services 
digital network (ISDN) card or a modem to provide a data 
communication connection to a corresponding type of tele- 
40 phone line. As another example, communication interface 
1413 may be a local area network interface card (LAN NIC) 
to provide a data communication connection to a compatible 
LAN. Wireless links, such as well-known 802.11a, 802.11b, 
802. 1 1 g and Bluetooth may also used for network implemen- 
45 tation. In embedded avionics implementations of the net- 
work, one of the standard backplane data buses such as, 
ARINC 629 or an optical avionics data bus may be used. A 
TTP data bus may also be used, such as in automotive and 
aerospace applications. In any such implementation, commu- 
50 nication interface 1413 sends and receives electrical, electro- 
magnetic or optical signals that carry digital data streams 
representing various types of information. 

Network link 1413 typically provides data communication 
through one or more networks to other network resources . For 
55 example, network link 1414 may provide a connection 
through local network 1415 to a host computer 1416, or a 
network storage/server 1422. Additionally or alternatively, 
the network link 1413 may connect through gateway/firewall 
1417 to the wide-area or global network 1418, such as an 
60 Internet. Thus, the computer platform 1401 can access net- 
work resources located anywhere on the Internet 1418, such 
as a remote network storage/server 1419. On the other hand, 
the computer platform 1401 may also be accessed by clients 
located anywhere on the local network 1415 and/or the Inter- 
65 net 1418. The network clients 1420 and 1421 may themselves 
be implemented based on the computer platform similar to 
the platform 1401. 
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Local network 1415 and the Internet 1418 both use elec- 
trical, electromagnetic or optical signals that carry digital data 
streams. The signals through the various networks and the 
signals on network link 1414 and through communication 
interface 1413 , which carry the digital data to and from com- 
puter platform 1401 , are exemplary forms of carrier waves 
transporting the information. 

Computer platform 1401 can send messages and receive 
data, including program code, through the variety of 
network(s) including Internet 1418 and local network 1415 , 
network link 1414 and communication interface 1413 . In the 
Internet example, when the system 1401 acts as a network 
server, it might transmit a requested code or data for an 
application program running on client(s) 1420 and/or 1421 
through Internet 1418 , gateway/firewall 1417 , local network 
1415 and communication interface 1413 . Similarly, it may 
receive code from other network resources. 

The received code may be executed by processor 1405 as it 
is received, and/or stored in persistent or volatile storage 
devices 1408 and 1406 , respectively, or other non-volatile 
storage for later execution. In this manner, computer system 
1401 may obtain application code in the form of a carrier 
wave. 

Finally, it should be understood that processes and tech- 
niques described herein are not inherently related to any 
particular apparatus and may be implemented by any suitable 
combination of components. Further, various types of general 
purpose devices may be used in accordance with the teach- 
ings described herein. It may also prove advantageous to 
construct specialized apparatus to perform the method steps 
described herein. The present invention has been described in 
relation to particular examples, which are intended in all 
respects to be illustrative rather than restrictive. Those skilled 
in the art will appreciate that many different combinations of 
hardware, software, and firmware will be suitable for prac- 
ticing the present invention. For example, the described soft- 
ware may be implemented in a wide variety of programming 
or scripting languages, such as Assembler, VHDL, C/C++, 
Matlab/Simulink, Labview, python, perl, Java, etc. 

Moreover, other implementations of the invention will be 
apparent to those skilled in the art from consideration of the 
specification and practice of the invention disclosed herein. 
Various aspects and/or components of the described embodi- 
ments may be used singly or in any combination in the inven- 
tive diagnostic and monitoring system. It is intended that the 
specification and examples be considered as exemplary only, 
with a true scope and spirit of the invention being indicated by 
the following claims. 

What is claimed is: 

1. A method for computing diagnostic estimates for faults 
of an apparatus with condition sensors connected to a com- 
puter; 

the method comprising: 

processing data from the condition sensors to obtain a 
set of parity parameters y reflecting apparatus condi- 
tion deviation from normality at time period t, 
wherein the processing is performed by a processing 
module programmed for processing data from the 
sensors, 

collecting the parity parameters y over a moving horizon 
interval of time of a fixed maximal duration and end- 
ing at time period t in a data vector Y(t), wherein the 
collecting is performed by a collector module config- 
ured for collecting the parity parameters y, 
computing estimates of at least one fault condition and 
likelihood parameters for each of the at least one fault 
condition, wherein computing is performed using a 
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computing module configured for computing esti- 
mates of fault conditions and likelihood parameters 
for each of the fault conditions, and 
transmitting the computed estimates of the fault condi - 
5 tions to the display device or to an automated decision 

and control system or storing the estimates in the 
memory, wherein the transmitting is performed using 
a transmitting module configured for transmitting the 
computed estimates of the fault conditions; 

1 0 wherein a fault condition k at time period t is characterized 

by fault intensity parameter x^(t); 

computing estimates of the fault intensity parameter x*(t) 
over the moving horizon interval of time and likelihood 
15 parameters p^ for each fault condition k, said computa- 
tion being done for one fault condition k at a time, said 
computation further being performed in two steps, the 
first step being a formulator step and the second step 
being an optimizer step, 

20 wherein the formulator step formulates a convex opti- 
mization program for a fault condition using the data 
vector Y(t), and the fault signature corresponding to 
the fault condition k, 

wherein the optimizer step numerically finds a solution 
25 of the convex optimization program encoded by the 

formulator step, the solution being computed with a 
pre-defined accuracy for fault condition k; 

and whereby the computed estimates for faults comprises 
estimates of fault condition intensity parameters x^(t) 
30 over the moving horizon interval of time computed as 

an optimal solution or as a transformation of the solu- 
tion, and 

likelihood parameter p^ computed as an optimum value 
of the program or as a transformation of the optimum 
35 value. 

2. The method of claim 1 wherein the formulator step 
further comprises formulating and the optimizer step further 
comprises solving a convex optimization program of either: 

a) an isotonic or monotonic regression program, 

40 b) a univariate convex program, 

c) a Quadratic Program, 

d) a Linear Program, 

e) a Second-order Cone Program, 

f) a constrained convex optimization program, or 

45 g) a convex optimization program having a known opti- 
mizer solver. 

3 . A method of claim 1 wherein the convex optimization 
program further comprises additional decision variables in 
addition to the fault intensity parameters x^(t). 

50 4 . The method of claim 1 wherein the parity parameters y 

further comprise prediction residuals obtained as a difference 
of obtained readings from the sensor and readings predicted 
for an apparatus model which receives the same inputs as the 
apparatus; the apparatus model comprising either a dynamic 
55 model, a nonlinear map, a set of static values corresponding to 
a chosen steady state regime, or another computer simulation 
model of the apparatus. 

5 . The method of claim 1 wherein fault signatures represent 
responses observed in the data y when a fault occurs. 

60 6. The method of claim 1 wherein the method is imple- 

mented on-line in a computer or computers connected to the 
sensors of the apparatus or implemented off-line by collect- 
ing data from the apparatus, transmitting it by electronic 
means to a computer implementing the method, and perform- 
65 ing the method computations at a later time. 

7 . The method of claim 1 wherein the fault condition 
parameters and likelihood parameters computed by the opti- 
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mizer are used for improving safety of the apparatus opera- 
tion, or for improving apparatus performance, or for sched- 
uling a maintenance action. 

8. The method of claim 1 where the formulator step further 
comprises formulating and the optimizer step further com- 5 
prises solving the convex problem when one or more of the 
components of vector Y(t) is missing or unavailable. 

9. A system for computing diagnostic estimates for faults 

of an apparatus with condition sensors connected to a com- 
puter; 10 

the system comprising: 

a processing module programmed for processing data 
from the sensors to obtain a set of parity parameters y 
reflecting apparatus condition deviation from normal- 
ity at time period t, 15 

a collector module configured for collecting the parity 
parameters y over a moving horizon interval of time of 
a fixed maximal duration and ending at time period t 
in a data vector Y(t), 

a computing module configured for computing esti- 20 
mates of fault conditions and likelihood parameters 
for each of the fault conditions, and 
a transmitting module configured for transmitting the 
computed estimates of the fault conditions to a dis- 
play device or to an automated decision and control 25 
system or storing the estimates in memory; 

wherein a fault condition k at time period t is characterized 
by fault intensity parameter x^(t), 

a computing circuit computing fault intensity parameters 
x^(t) over the moving horizon interval of time and like- 30 
lihood parameters p^ for each fault condition k, said 
computing is done for one fault condition k at a time, 
said computing is performed in two steps, the first step 
being a formulator step and the second step being an 
optimizer step, 35 

the formulator step formulates a convex optimization 
program for fault condition using the moving horizon 
data vector Y (t), and the fault signature corresponding 
to the fault condition k, 

the optimizer step numerically finds the solution of the 40 
convex optimization program encoded by the formu- 
lator, the solution is computed with a pre-defined 
accuracy for fault condition k; 

whereby the diagnostic estimates for faults comprises 

estimates of fault condition intensity parameters x^(t) 45 
over the moving horizon interval of time computed as 
the optimal solution or as a transformation of the said 
solution, and 

likelihood parameter p k computed as the optimum value 
of the program or as a transformation of the said 50 
optimum value. 

10. A system of claim 9 wherein the formulator step for- 
mulates and the optimizer step solves one of the following 
optimization programs 

a) an isotonic or monotonic regression program 55 

b) a univariate convex program 

c) a Quadratic Program 

d) a Linear Program 

e) a Second-order Cone Program, 

f) a constrained convex optimization program, or 60 

g) a convex optimization program having a known opti- 
mizer solver. 

11 . The system of claim 9 wherein the convex program for 

the fault condition includes additional decision variables in 
addition to the fault intensity parameters x^(t). 65 

12. The system of claim 9 wherein the parity parameters y 
further comprise prediction residuals obtained as a difference 
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of the obtained sensor readings and the readings predicted for 
an apparatus model which receives the same inputs as the 
apparatus; the apparatus model comprising either a dynamic 
model, a nonlinear map, a set of static values corresponding to 
a chosen steady state regime, or another computer simulation 
model of the apparatus. 

13. The system of claim 9 wherein fault signatures repre- 
sent responses observed in the parity parameters y when a 
fault occurs. 

14. The system of claim 9 wherein the system is imple- 
mented on-line in a computer or computers connected to the 
sensors of the apparatus or implemented off-line by collect- 
ing data from the apparatus, transmitting it by electronic 
means to a computer implementing the method, and perform- 
ing the method computations at a later time. 

15. The system of claim 9 wherein the fault condition 
parameters and likelihood parameters computed by the opti- 
mizer are used for improving safety of the apparatus opera- 
tion, or for improving apparatus performance, or for sched- 
uling a maintenance action. 

16. The system of claim 9 wherein the formulator step 
formulates and the optimizer step solves the convex problem 
when one or more of the components of vector Y (t) is missing 
or unavailable. 

17. A tangible computer readable medium embodying a set 
of computer-executable instructions, which, when executed 
on a computer, implements a method for computing diagnos- 
tic estimates for faults of an apparatus with condition sensors 
connected to a computer; 

the method comprising: 

processing data from the condition sensors to obtain a 
set of parity parameters y reflecting apparatus condi- 
tion deviation from normality at time period t, 
wherein the processing is performed by a processing 
module programmed for processing data from the 
sensors, 

collecting the parity parameters y over a moving horizon 
interval of time of a fixed maximal duration and end- 
ing at time period t in a data vector Y(t), wherein the 
collecting is performed by a collector module config- 
ured for collecting the parity parameters y, 
computing estimates of at least one fault condition and 
likelihood parameters for each of the at least one fault 
condition, wherein computing is performed using a 
computing module configured for computing esti- 
mates of fault conditions and likelihood parameters 
for each of the fault conditions, and 
transmitting the computed estimates of the fault condi- 
tions to a display device or to an automated decision 
and control system or storing the estimates in 
memory, wherein the transmitting is performed using 
a transmitting module configured for transmitting the 
computed estimates of the fault conditions; 

wherein a fault condition k at time period t is characterized 
by fault intensity parameter x*(t); 

computing estimates of the fault intensity parameter x^(t) 
over the moving horizon interval of time and likelihood 
parameters p* for each fault condition k, said computa- 
tion being done for one fault condition k at a time, said 
computation further being performed in two steps, the 
first step being a formulator step and the second step 
being an optimizer step, 

wherein the formulator step formulates a convex opti- 
mization program for a fault condition using the data 
vector Y(t), and the fault signature corresponding to 
the fault condition k, 
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wherein the optimizer step numerically finds a solution 
of the convex optimization program encoded by the 
formulator step, the solution being computed with a 
pre-defined accuracy for fault condition k; 

and wherein the computed estimates for faults comprises: 
estimates of fault condition intensity parameters x^(t) 
over the moving horizon interval of time computed as 
an optimal solution or as a transformation of the solu- 
tion, and 

likelihood parameter p^ computed as an optimum value 
of the program or as a transformation of the optimum 
value. 

18. The computer readable media of claim 17 wherein the 
formulator step further comprises formulating and the opti- 
mizer step further comprises solving a convex optimization 
program of either: 

a) an isotonic or monotonic regression program, 

b) a univariate convex program, 

c) a Quadratic Program, 

d) a Linear Program, 

e) a Second-order Cone Program, 

f) a constrained convex optimization program, or 

g) a convex optimization program having a known opti- 
mizer solver. 

19. The computer readable media of claim 17 wherein the 
convex optimization program further comprises additional 
decision variables in addition to the fault intensity parameters 
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20. The computer readable media of claim 17 wherein the 
parity parameters y further comprise prediction residuals 
obtained as a difference of obtained readings from the sensor 
and readings predicted for an apparatus model which receives 

5 the same inputs as the apparatus; the apparatus model com- 
prising either a dynamic model, a nonlinear map, a set of 
static values corresponding to a chosen steady state regime, or 
another computer simulation model of the apparatus. 

21 . The computer readable media of claim 17 wherein fault 

to signatures represent responses observed in the data y when a 

fault occurs. 

22. The computer readable media of claim 17 wherein the 
method is implemented on-line in a computer or computers 
connected to the sensors of the apparatus or implemented 

15 off-line by collecting data from the apparatus, transmitting it 
by electronic means to a computer implementing the method, 
and performing the method computations at a later time. 

23. The computer readable media of claim 17 where the 
fault condition parameters and likelihood parameters com- 

20 puted by the optimizer are used for improving safety of the 
apparatus operation, or for improving apparatus perfor- 
mance, or for scheduling a maintenance action. 

24. The computer readable media of claim 17 where the 
formulator step further comprises formulating and the opti- 

25 mizer step further comprises solving the convex problem 
when one or more of the components of vector Y (t) is missing 
or unavailable. 



